Clustering of gene locations

نویسندگان

  • Eli Walters
  • Naomi S. Altman
  • Laura Elnitski
چکیده

Genes that are more closely spaced on the chromosome than expected by chance are said to be spatially clustered. Standard tests of clustering versus uniformity do not take into account two important features of genes—the high variability of gene length and the low probability that gene locations overlap (exclusion). We show by simulation that the standard null distributions which ignore length and exclusion do not appropriately approximate the true null distributions of standard tests such as the chi-squared test.We therefore recommend bootstrap sampling to estimate the null distributions. Simulations demonstrate that the chi-squared goodness-of-fit test is a more powerful test of clustering than two other commonly used tests—Kolmogorov andCramer–vonMises—when the distribution of gene lengths and locations is modeled by a mixture of exponentials and there is a single cluster. The chi-squared test requires binning the gene locations—the number of genes in the bin can be compared to the expectedmaximum number under random distribution to determine the location of gene clusters and gene deserts. The bootstrap method to test clustering is illustrated using data from human chromosome 22. © 2005 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information

Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...

متن کامل

Study of genetic diversity in pomegranate germplasm of Yazd province of Iran

A total of 117 pomegranate genotypes collected from different areas of Yazd province of Iran were studied for genetic variation by evaluating 23 morphological traits according to the international descriptor. Similar diversity pattern of the measured characteristics was observed in three types of sweet, sweet-sour and sour varieties. The traits shape of fruit base, suckering tendency, vigor of ...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 50  شماره 

صفحات  -

تاریخ انتشار 2006